24 research outputs found

    Massive-Scale RDF Processing Using Compressed Bitmap Indexes

    Full text link
    The Resource Description Framework (RDF) is a popular data model for representing linked data sets arising from the web, as well as large scienti#12;c data repositories such as UniProt. RDF data intrinsically represents a labeled and directed multi-graph. SPARQL is a query language for RDF that expresses subgraph pattern-#12;nding queries on this implicit multigraph in a SQL- like syntax. SPARQL queries generate complex intermediate join queries; to compute these joins e#14;ciently, we propose a new strategy based on bitmap indexes. We store the RDF data in column-oriented structures as compressed bitmaps along with two dictionaries. This paper makes three new contributions. (i) We present an e#14;cient parallel strategy for parsing the raw RDF data, building dictionaries of unique entities, and creating compressed bitmap indexes of the data. (ii) We utilize the constructed bitmap indexes to e#14;ciently answer SPARQL queries, simplifying the join evaluations. (iii) To quantify the performance impact of using bitmap indexes, we compare our approach to the state-of-the-art triple-store RDF-3X. We #12;nd that our bitmap index-based approach to answering queries is up to an order of magnitude faster for a variety of SPARQL queries, on gigascale RDF data sets

    Semantic disambiguation and contextualisation of social tags

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-642-28509-7_18This manuscript is an extended version of the paper ‘cTag: Semantic Contextualisation of Social Tags’, presented at the 6th International Workshop on Semantic Adaptive Social Web (SASWeb 2011).We present an algorithmic framework to accurately and efficiently identify the semantic meanings and contexts of social tags within a particular folksonomy. The framework is used for building contextualised tag-based user and item profiles. We also present its implementation in a system called cTag, with which we preliminary analyse semantic meanings and contexts of tags belonging to Delicious and MovieLens folksonomies. The analysis includes a comparison between semantic similarities obtained for pairs of tags in Delicious folksonomy, and their semantic distances in the whole Web, according to co-occurrence based metrics computed with results of a Web search engine.This work was supported by the Spanish Ministry of Science and Innovation (TIN2008-06566-C04-02), and Universidad Autónoma de Madrid (CCG10-UAM/TIC-5877

    Imposing a Semantic Schema for the Detection of Potential Mistakes in Knowledge Resources

    Get PDF
    Society is becoming a complex socio-technical ecosystem requiring novel ICT solutions to provide the basic infrastructure for innovative services able to address key societal challenges and to assist people in their everyday activities. To tackle such complexity, there is a pressing need for very accurate, up-to-date and diversity-aware knowledge resources which can guarantee that results of automatic processing can be trusted enough for decision making processes. As the maintenance of such resources turns out to be very expensive, we argue that the only affordable way to address this is by complementing automatic with manual checks. This paper presents a methodology, based on the notion of semantic schema, which aims to minimize human intervention as it allows the automatic identification of potentially faulty parts of a knowledge resource which need manual checks. Our evaluation showed promising results

    A Semantic Social Bookmarking System Based on a Wiki-Like Approach

    No full text

    Relation Extraction from the Web Using Distant Supervision

    No full text
    Abstract. Extracting information from Web pages requires the ability to work at Web scale in terms of the number of documents, the number of domains and domain complexity. Recent approaches have tried to use existing knowledge bases, e.g. from the Linking Open Data cloud, to learn to extract information with promising results. In this paper we propose the use of distant supervision to learn to extract relations from the Web. Distant supervision is a method which uses background information from the Linking Open Data cloud to automatically label sentences with relations to create training data for relation classifiers. Although the method is promising, existing approaches are still not suitable for Web extraction as they suffer from three main issues: data sparsity, noise and lexical ambiguity. Our approach reduces the impact of data sparsity by making entity recognition tools more robust across domains, as well as extracting relations across sentence boundaries. We reduce the noise caused by lexical ambiguity by employing statis-tical methods to strategically select training data. Our experiments show that using a more robust entity recognition approach and expanding the scope of relation extraction results in about 8 times the number of extractions, and that strategically selecting training data can result in an error reduction of about 30%.

    Overview of the INEX 2009 entity ranking track

    No full text
    In some situations search engine users would prefer to retrieve entities instead of just documents. Example queries include "Italian Nobel prize winners", "Formula 1 drivers that won the Monaco Grand Prix", or "German spoken Swiss cantons". The XML Entity Ranking (XER) track at INEX creates a discussion forum aimed at standardizing evaluation procedures for entity retrieval. This paper describes the XER tasks and the evaluation procedure used at the XER track in 2009, where a new version of Wikipedia was used as underlying collection; and summarizes the approaches adopted by the participants

    Linked Data-Driven Smart Spaces

    No full text
    In this paper, we present an approach to exploit Linked Data in Smart Spaces, doing more than just using RDF to represent informa- tion. In particular, we rely on knowledge stored in DBpedia1, a dataset in the Web of Data. We also provide a platform to implement such an approach and a eTourism use case, both developed in collaboration with a mobile operator. Finally, we provide also a performance evaluation of the main component of the platform

    Efficiently joining group patterns in sparql queries

    No full text
    Abstract. In SPARQL, conjunctive queries are expressed by using shared variables across sets of triple patterns, also called basic graph patterns. Based on this characterization, basic graph patterns in a SPARQL query can be partitioned into groups of acyclic patterns that share exactly one variable, or star-shaped groups. We observe that the number of triples in a group is proportional to the number of individuals that play the role of the subject or the object; however, depending on the degree of participation of the subject individuals in the properties, a group could be not much larger than a class or type to which the subject or object belongs. Thus, it may be significantly more efficient to independently evaluate each of the groups, and then merge the resulting sets, than linearly joining all triples in a basic graph pattern. Based on this observation, we have developed query optimization and evaluation techniques on star-shaped groups. We have conducted an empirical analysis on the benefits of the optimization and evaluation techniques in several SPARQL query engines. We observe that our proposed techniques are able to speed up query evaluation time for join queries with star-shaped patterns by at least one order of magnitude.
    corecore